Floating-Point Arithmetic
Table of Contents
Docs for Reference
- What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg ./What Every Computer Scientist Should Know About Floating-Point Arithmetic.pdf
- Single-precision floating-point format(wikipedia) http://en.wikipedia.org/wiki/Single_precision
- Double-precision floating-point format http://en.wikipedia.org/wiki/Double_precision
- Floating point http://en.wikipedia.org/wiki/Floating_point
The problem of the Floating-Point
For example, the real number ‘.37’ cannot be represented exactly by the arithmetic series described above so, if you assign this number to a floating point, the value stored could actually be ‘0.370000004’. This can be seen easily if we write a simple program that prints a floating point value to a lot of decimal places.
// some code to print a floating point number to a lot of // decimal places int main() { float f = .37; printf("%.20f\n", f); }
More examples:
Rounding Error
Floating-point Formats
Several different representations of real numbers have been proposed, but by far the most widely used is the floating-point representation.1Floating-point representations have a base β (which is always assumed to be even) and a precision p. If β = 2 and p = 24, then the decimal number 0.1 cannot be represented exactly, but is approximately 1.10011001100110011001101 × 2-4.
In general, a floating-point number will be represented as ± d.dd… d ×βe, where d.dd…d is called the significand and has p digits. More precisely ±d0.d1 d2 … dp-1 ×βe represents the number
± (d0+d1β-1+…+dp-1β-(p-1))βe, (0< di <β)
Footnotes:
Examples of other representations are floating slash and signed logarithm [Matula and Kornerup 1985; Swartzlander and Alexopoulos 1975].